A Declarative Query Processing System for Nowcasting
نویسندگان
چکیده
Nowcasting is the practice of using social media data to quantify ongoing real-world phenomena. It has been used by researchers to measure flu activity, unemployment behavior, and more. However, the typical nowcasting workflow requires either slow and tedious manual searching of relevant social media messages or automated statistical approaches that are prone to spurious and low-quality results. In this paper, we propose a method for declaratively specifying a nowcasting model; this method involves processing a user query over a very large social media database, which can take hours. Due to the human-in-the-loop nature of constructing nowcasting models, slow runtimes place an extreme burden on the user. Thus we also propose a novel set of query optimization techniques, which allow users to quickly construct nowcasting models over very large datasets. Further, we propose a novel query quality alarm that helps users estimate phenomena even when historical ground truth data is not available. These contributions allow us to build a declarative nowcasting data management system, RaccoonDB, which yields high-quality results in interactive time. We evaluate RaccoonDB using 40 billion tweets collected over five years. We show that our automated system saves work over traditional manual approaches while improving result quality—57% more accurate in our user study—and that its query optimizations yield a 424x speedup, allowing it to process queries 123x faster than a 300-core Spark cluster, using only 10% of the computational resources.
منابع مشابه
انتخاب مناسبترین زبان پرسوجو برای استفاده از فراپیوندها جهت استخراج دادهها در حالت دیتالوگ در سامانه پایگاه داده استنتاجی DES
Deductive Database systems are designed based on a logical data model. Data (as opposed to Relational Databases Management System (RDBMS) in which data stored in tables) are saved as facts in a Deductive Database system. Datalog Educational System (DES) is a Deductive Database system that Datalog mode is the default mode in this system. It can extract data to use outer joins with three query la...
متن کاملHorton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs
Horton+ is a graph query processing system that executes declarative reachability queries on a partitioned attributed multi-graph. It employs a query language, query optimizer, and a distributed execution engine. The query language expresses declarative reachability queries, and supports closures and predicates on node and edge attributes to match graph paths. We introduce three algebraic opera...
متن کاملXML Query Processing in XTC
In the past, the development of a declarative, set-based interface to access data in a DBMS was a key factor for the success of database systems. For XML, the lingua franca for declarative data access is XQuery. This paper summarizes the XQuery processing concepts that have been developed in the XTC system (the XML Transaction Coordinator)—a native XML database management system. We step throug...
متن کاملQuery Decomposition Using the XML Declarative Description Language
Query decomposition is one of the most important phases of query processing in an integrated database system. A global query is decomposed into several sub-queries conforming to local formats, which can be used to extract data from distributed databases. In this paper a new query decomposition methodology for integrated XML databases is introduced. A special construction of mappings is also int...
متن کاملDeclarative Semantics in Object-Oriented Software Development - A Taxonomy and Survey
One of the modern paradigms to develop an application is object oriented analysis and design. In this paradigm, there are several objects and each object plays some specific roles in applications. In an application, we must distinguish between procedural semantics and declarative semantics for their implementation in a specific programming language. For the procedural semantics, we can write a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 10 شماره
صفحات -
تاریخ انتشار 2016